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Abstract — The problem of security against timing based traffic 
analysis in wireless networks is considered in this work. An 
analytical measure of anonymity in eavesdropped networks is 
proposed using the information theoretic concept of equivocation. 
For a physical layer with orthogonal transmitter directed signal- 
ing, scheduling and relaying techniques are designed to maximize 
achievable network performance for any given level of anonymity. 
The network performance is measured by the achievable relay 
rates from the sources to destinations under latency and medium 
access constraints. In particular, analytical results are presented 
for two scenarios: 

For a two-hop network with maximum anonymity, achievable 
rate regions for a general m x 1 relay are characterized when 
nodes generate independent Poisson transmission schedules. The 
rate regions are presented for both strict and average delay 
constraints on traffic flow through the relay. 

For a multihop network with an arbitrary anonymity re- 
quirement, the problem of maximizing the sum-rate of flows 
(network throughput) is considered. A selective independent 
scheduling strategy is designed for this purpose, and using 
the analytical results for the two-hop network, the achievable 
throughput is characterized as a function of the anonymity level. 
The throughput-anonymity relation for the proposed strategy is 
shown to be equivalent to an information theoretic rate-distortion 
function. 

Index Terms — Network Security, Traffic Analysis, Secrecy, 
Rate-Distortion. 



I. Introduction 

Traffic analysis attaclcs are carried out by eavesdroppers 
monitoring node transmissions to obtain networking informa- 
tion such as source-destination pairs and paths of data flow. 
Traffic analysis has played a prominent role in modern warfare 
[1] and its adverse effects on computer networks is well 
documented in literature [2], [3], [4], [5]. For example, the 
weaknesses of protocols for web browsing [4], [6] and SSH 
[7] have been exposed through traffic analysis. 

The primary focus of this work is an analytical approach 
to security against traffic analysis in wireless networks and 
the design of provably secure countermeasures. Owing to the 
unprotected medium of communication, eavesdropping node 
transmissions in wireless networks is easy and undetectable. 
Although cryptography can be used to prevent analysis based 
on contents or packet lengths (see Section iTBl i. the knowledge 
of transmission epochs alone can reveal critical information 
such as paths of information flow. We address the problem 
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of designing anonymous transmission schedules and relaying 
strategies to counter the transmission epoch based inference 
of data flows by eavesdroppers. 

The challenge in designing anonymous transmission strate- 
gies is to adhere to the networking constraints while hiding 
information from eavesdroppers. Wireless networks are subject 
to constraints on medium access, latency and stability, which 
generally result in a high correlation across transmission 
schedules of nodes in a path. The need for anonymity however 
necessitates that paths are not revealed by correlation of 
transmission schedules. These contrasting paradigms result 
in a tradeoff between anonymity and network performance. 
For example, consider the simple two hop setup shown in 
Fig. [T] wherein node B relays packets received from nodes 
Si and 5*2 subject to a strict delay constraint. Assuming 
the nodes use orthogonal channels, if the transmission rates 
Ri , i?2 are bounded, then the rates of packets that can be 
relayed successfully is given by a pentagon (solid line in Fig. 
[T]i. Rates in this region are achieved if the relay transmits 
every received packet after a small processing delay. It is easy 
to see that such a strategy would result in a high correlation 
between the source and relay schedules. If, in addition to 
the networking constraints, the source and relay schedules are 
forced to be statistically independent, an eavesdropper would 
not detect correlation across schedules, thus hiding the relaying 
operation. The delay constraint may, however, result in packet 
drops or require dummy transmissions thereby reducing the 
achievable relay rates. 

The relaying operation of Figure [T] represents the basic 
component in wireless networking, and the characterization 
of the achievable rate region with provable anonymity is one 
of the contributions of this work. The example highlights that 
providing anonymity in communication requires a reduction 
in communication rates. A primary goal of this work is to 
characterize this trade-off between anonymity and network 
performance. An analytical approach for the characterization 
requires a quantifiable notion of anonymity, which we mea- 
sure using the uncertainty in networking information (ac- 
tive routes in the network) inferable by the adversary. The 
example discussed suggests a simple technique to provide 
perfect anonymity by letting all nodes generate statistically 
independent schedules, but this strategy may not provide 
scalable performance for large networks. Our goal is to design 
transmission strategies that sacrifice minimum performance 
while maintaining a certain level of anonymity. 
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(a) Sources Si , S2 transmit packets to Desti- 
nations Di, D2 tlirougli Relay B 




R2 < C2 



(b) Achievable Rate Region : The horizontal and 
vertical boundaries are due to rate constraints C\ , C2 
on nodes Si, 82- The sum-rate constraint is due to 
stability at relay B. The inner region (in dotted line) 
represents the achievable rate region with independent 
scheduling that we wish to characterize. 



Fig. 1: Two Hop Relay Network 



A. Main Contributions 

We propose an analytical framework for anonymous 
scheduling against traffic analysis in wireless networks. In 
particular, we define a mathematical notion for anonymity of 
routes, based on Shannon's equivocation [8], when eavesdrop- 
pers observe transmission epochs of all nodes in the network. 
The main results obtained under this model are divided into 
two segments. 

Assuming maximum anonymity requirement, we design 
scheduling and relaying strategies for a two hop multiple 
source single relay system (see Fig. lU when nodes use 
orthogonal transmitter directed signaling. In particular, when 
the transmission schedules of nodes are independent Poisson 
processes, we characterize the achievable rate region analyt- 
ically. Although independent Poisson scheduling may not be 
optimal for a strict delay constraint on the relay, we show 
that, under certain physical layer conditions, the achievable 
relay rates are optimal for an average delay constraint. 

For a general multihop network, we propose a randomized 
scheduling strategy for any given level of anonymity a, and 
utilizing the results of the two hop system, characterize the 
achievable sum-rate of data flows as a function of a. Our key 
result in this framework shows the equivalence between the 
sum-rate anonymity tradeoff and information theoretic rate- 
distortion. 

The connection between rate distortion and anonymous 



networking is not tied to our strategy and can be explained 
using a general intuition. The objective of the rate-distortion 
problem is to generate fewest number of codewords for a set of 
source sequences, such that the corresponding reconstruction 
sequences satisfy a specified distortion constraint. The idea 
is to divide the set of source sequences into fewest number 
of bins such that the distortion between each sequence in a 
bin and the reconstruction sequence is less than the specified 
constraint. Alternatively, fixing the code rate fixes the total 
number of bins. Then, the sequences are placed optimally 
within each bin such that the corresponding reconstruction 
sequences minimize the expected distortion. 

In the anonymous networking setup, let the set of active 
routes at any given time be referred to as a network session. 
The key idea is to divide the set of all possible network 
sessions into bins such that, for each bin, there exists a 
scheduling strategy that would make the sessions within 
that bin indistinguishable to an eavesdropper. The level of 
anonymity required determines the number of bins, and the 
optimal scheduling strategy plays the role of the reconstruction 
sequence by minimizing the performance loss across sessions 
within the bin. 

B. Related Work 

Although prevention of traffic analysis is a classical prob- 
lem, a dominant portion of prior research has centered around 
Internet applications. In that regard, an important countermea- 
sure was provided by Chaum through the concept of the traffic 
Mix [9]. A Mix node uses re-encryption and packet padding 
to prevent correlation based on contents or lengths across 
packets. Further, by batching and reordering packets, the Mix 
provides anonymity of source-destination pairs. Subsequent 
improvements in the anonymity provided by the Mix included 
random delaying (Stop-and-Go Mixes [10]) and introducing 
dummy packets (ISDN Mixes [11]). The concept of Mixes 
was successfully used in designing remailer and proxy systems 
[12], [13], [14] for the Internet. 

Although Mixes provide an ideal solution for many Internet 
applications, when strict constraints on delay or buffer size 
are imposed, it was shown [15] that a Mix no longer provided 
anonymity to long streams of traffic. An alternative approach, 
designed primarily for multihop wireless networks is that of 
deterministic scheduling [16]. In [16], the authors propose a 
fixed periodic schedule for the entire network, wherein every 
node adhered to the schedule by transmitting dummy packets 
whenever actual data was not present. Although the idea of 
fixed scheduling can be adapted to handle delay constraints, 
constant transmission of dummy packets is inefficient and 
furthermore, the centralized synchronous implementation is 
impractical for ad hoc wireless networks. 

A key component of our approach is the analytical model 
for anonymity of routes. In mix networks, anonymity has been 
measured using the size or entropy of the anonymity set (set 
of possible source-destination pairs) of an observed packet. 
In the context of this work, the use of anonymity sets has 
two disadvantages. First, hiding source-destination pairs alone 
may not be sufficient, the direction of data flow could also 
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reveal critical information. Second, the measure of anonymity 
needs to cater to streams of packets rather than a single packet 
[15]. Our metric for anonymity is based on the information 
theoretic notion of equivocation, proposed by Shannon [8]. 
Previous applications of equivocation measured the secrecy of 
transmitted data on point-to-point channels [17], [18], whereas 
we use equivocation to measure the secrecy of routes in a 
network. 

Prevention of traffic analysis can also be viewed as the 
complementary problem to intrusion detection [19], which 
is another important area in network security. Some of the 
techniques we use to design anonymous relaying strategies 
are motivated by prior work on stepping stone detection [20]. 

II. Analytical Model 

The main problem addressed in this paper is to design 
transmission and relaying strategies that are resilient to traffic 
analysis and use them to characterize the relationship between 
achievable network performance and the level of anonymity. 
We consider a specific category of delay sensitive traffic and 
measure the network performance using achievable packet 
relay rates from source to destination. 

A. Notation 

Let Q = (V, £) be a directed graph, where V is the 
set of nodes in the network and £ C V x V is the set 
of directed links. If {A, B) is an element of £, then node 
B can receive transmissions from node A. A sequence of 
nodes P = {Vi, - ■ ■ ,Vn) e V* is a valid patlQ in Q if 
{Vi,Vi+i) e 8, Vz < n. The set of all possible paths in Q 
is denoted by V{Q). 

We assume that during any network observation by the 
eavesdropper, a subset of nodes communicate using a fixed 
set of paths. This set of paths S G 2^^^^ is referred to as a 
network session. The information that we wish to hide from 
the eavesdropper is the network session S. We model S as 
an i.i.d. random variable with a probability mass function 
{p{s) : s £ 2'^'^^'}. Therefore,the set of all possible sessions 
is given by 

5 = {s e 2^(5) . > 0}. 

The prior information p{S) on sessions can be obtained 
using the topology and applications of the particular network, 
and is also available to the eavesdropper. 

For example, in a simple network Qi as shown in Figure |2] 
let ^i, 5*2 be the only allowed sources and Di, D2 the allowed 
destinations. Further, let the sources always communicate with 
distinct destinations. For such a network, V{Gi), the set of all 
possible paths, is given by 

nOi)^ { {Si,B),{Si,B,D,),{Si,B,D2),{S2,B), 

(52, B, Di), (52, B, D2), {B, Di), (B, D^) }. 

Due to the restriction on distinct destinations, the set of vahd 
sessions S contains only two sessions: 

S= { {{Si,B,D,),{S2,B,D2)} 

{{Si,B,D2),{S2,B,D,)}}. 

*The notation V* refers to IJ^ V. 
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Fig. 2: Two Node Switching Network: Qi = (V, £), 

Vi ={5i,52,B,Di,I?2}, 

£i = {{Si,B), (52, B), {B, D,), {B, D2)}. 

Transmission Schedules The eavesdropper's observation con- 
sists of the packet transmission epochs in a session. Since it 
is not possible to determine the location of the eavesdrop- 
per(s), we assume that all transmissions are being monitored. 
Although the packets are encrypted, depending on the physical 
layer model, it may be possible for an eavesdropper to infer 
partial information about sender-receiver nodes of packets 
by merely detecting a transmission. We consider one such 
physical layer model known as a transmitter directed signaling 
model. 

Transmitter Directed Signaling: All packets transmitted by 
a particular node are modulated using the same spreading 
sequence, and each transmitting node is associated with 
a unique orthogonal spreading sequence. Under this 
transmission scheme, an eavesdropper would be able to 
"tune" his detector to a particular spreading sequence 
and detect the transmission times of packets sent by the 
corresponding node. Although he knows the transmitting 
node of each packet, we assume that headers are encrypted, 
so he would not know the intended recipient of any packet. 
Therefore, in a route involving multiple nodes, even when all 
transmission schedules are correlated, it is not possible for an 
eavesdropper to ascertain the final destination node. 

Eavesdropper Observation Let yA represent the schedule of 
packets transmitted by node A. The schedule yA is a point 
process, 

yA^{YA{l),YA{2),---}, 

where YA{i) represents the transmission epoch of the i*'' 
packet by node A. The eavesdropper detects packet trans- 
mission epochs which, by virtue of unique orthogonal codes, 
would provide him the identity of the transmitting node. 
Since we assume all nodes are monitored, the eavesdropper's 
complete observation is given by 3^ = {yA A £ V}. 

Note that, while y represents the schedules of packet 
transmissions detected by eavesdroppers, it does not specify 
which packets are relayed from sources to destinations in a 
session. In fact, some of the epochs in y could represent 
dummy transmissions by nodes. 

B. Anonymity Measure 

We model 3^ as a random sequence of epochs with con- 
ditional distribution (7(3^|S). The idea is to design (7(3^|S) 
such that eavesdroppers obtain minimum information about 
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the session S by observing y. Based on the information we 
wish to hide (S) and the observation of the eavesdropper (y), 
we use equivocation [8] to define the analytical measure of 
anonymity. 

Definition 1: A distribution (/(J^jS) is defined to have 
anonymity a if 

H{s\y) 



H{S) 



> a. 



When a = 1, the distribution 17(3^1 S) is defined to have 
perfect anonymity. For a distribution with perfect anonymity, 
given the observed schedules, the eavesdropper gains no 
additional information (than the prior p(S)) about the routes. 
In other words, 

H{^\y)^H{^). 

For a general a, a physical interpretation of anonymity 
can be obtained using Fano's Inequality [21]: Let the error 
probability of the eavesdropper in decoding the session S be 
Pe- Then, 

p ^ H{S\y)-l ^ aH{S) - 1 



log|5| 



log |5| 



Furthermore, if 5 is a large set with uniform prior {p{s) = 
j^,Vs}, then > a. In other words, the anonymity bounds 
the minimum probability of error incurred by the eavesdropper 
in decoding S. 

This notion of anonymity that we consider is different 
from previous definitions [22], [10], which were primarily 
used to hide the source-destination pair of each individual 
packet. To the best of our knowledge, this is the first definition 
of anonymity that deals with multihop routes and considers 
timing information in long streams of transmitted packets. 



C. Network Constraints and Throughput 

The key challenge in designing the schedule distribution 
g(3^|S) with provable anonymity is to sacrifice minimum 
performance under the networking constraints. In this work, 
we measure performance using the achievable rates of packets 
relayed from sources to destinations subject to constraints 
on medium access and latency, which are described as follows. 

Medium Access Constraints Wireless networks, due to re- 
strictions on shared bandwidth and transmission power, pose 
constraints on rates of packets transmitted and received. We 
consider long streams of packet transmissions, and measure 
the rate of packets transmitted using an asymptotic measure: 



Ta — lim 



YA{ny 



(1) 



where Ta denotes the rate of packets transmitted by a node A. 
Since each transmitting node is associated with an orthogonal 
spreading sequence, the constraint on each point process in 
y is independent. Specifically, the transmission rate Ta of a 
node A is bounded by a constant Ca, which depends on the 
characteristics of the medium and the transmission capability 
of node A. As long as Ta < Ca, successful reception is 
guaranteed at the intended receiver. 



We assume that the network operates in full duplex 
mode, where every node can transmit and receive packets 
simultaneously as long as all transmission rates are within 
the specified bounds. In other words, a set of schedules y is 
a valid network schedule if and only if Ta < Ca for every 
node A. 

Latency Constraint: We consider a strict delay constraint 
on the packets, where the packet delay at each intermediate 
relay in a route is bounded by A. In general, each relay 
is allowed to reencrypt packets, reorder arrived packets and 
transmit dummy packets. However, each received data packet 
at a relay is required to be forwarded within A time units of 
arrival, or otherwise, dropped. Such a strict delay constraint 
would apply in practice to time sensitive applications such as 
target tracking in sensor networks or streaming media in peer 
to peer networks. In general, a strict delay constraint would 
prevent congestions in the network and ensure stability, albeit 
at the cost of dropped packets. 

Note that the schedules in y only specify when packets 
are transmitted by each node, and do not indicate which 
packets actually travel from source to destination on each 
route of a session. For every schedule, we therefore need 
to specify a relaying strategy, represented by Z, which is a 
set of subsequences of 3^. The subsequences represent the 
transmissions epochs of packets that are relayed from sources 
to destinations and therefore, depend on the routes of the 
session as well as the delay constraint. 

Definition 2: Let a session S = (_Pi, • • ■ ,-P|s|)> where Pi 
— {A{i,l),--- , A{i,m{i))) is a valid path of length m(i), 
and A{i,j) e V represents the j*'' node in path Pi of session 
S. A set of subsequences Z — {Zij : i < \S\,j < m{i)} of 
3^ is a valid relaying strategy for S if: 

1. Vi, j Zi^j C 3^A(ij)- 

2. For every i,j,n 

< < A. 

3. If {A{i,j),A{i,j + 1)) = {A{l,m),A{l,m + l)), then 

2iJ n Zi^rn = 4>- 

In the above definition, condition 2 ensures that the relayed 
packets satisfy the delay constraint A at every intermediate 
relay from the sources to the destinations of the session. 
Condition 3 ensures that, if any pair of nodes is common to 
multiple routes, the subsequences picked from the transmission 
schedules are mutually exclusive. 

In Section IIII-CI we also consider a relaxed version of 
the delay constraint, where the average delay of packets is 
bounded at each relay. The definition for a relaying strategy 
with average delay constraint can be obtained by modifying 
condition 2 of Definition 2 as: 



\fi,j,n Zij+i{n) - Zi^j{n) > 0, 
■A Zjj+i(to) - Zij{m) 



lim / 

m—1 



< 



(2) 
(3) 
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(a) Vx is the transmission schedule of Node 
X. 



Definition 4: R is defined to be a weakly achievable sum- 
rate with anonymity a if 3q(3^|S) with anonymity a such that 

1. For every session s = {Pi, • • • , P\s\}, every realization 
of y given s is a valid network schedule. 

2. For every realization of (S,3^), there exists a valid 
relaying strategy Z, and 



|S| 



E 



(5) 



2si UZs^ 



Ai<A 

(b) Every packet in the set of subsequences satisfy the 
strict delay constraint of A 

Fig. 3: 2 X 1 Relay with Strict Delay Constraint 



D. Performance Metrics 

It is possible that the set of subsequences Z are a strict 
subset of the transmissions schedule y, or in other words, there 
are epochs in y that do not correspond to any relayed packets. 
Those transmission epochs in y that are not present in Z 
would either correspond to packets that are dropped eventually, 
or represent dummy packet transmissions. Therefore, for a 
session s = (Pi, • • • , ^|s|) ^nd relaying schedule Z, the rate 
of packets relayed from source to destination on route Pi is 
given by: 



A(Z, Pi) = lim 



n 



^oo Zi^i{n)' 

Note that, since condition 2 of Definition 2 ensures that all 
schedules on a route have same length, it is sufficient to use 
Zi^i to compute rate. 

Definition 3: Let the session vector s — (Pi,-- - ,Pfe), 
where Pi G V^" represents a valid path of data flow. Then, 
a rate vector A(s) = (Ai,-- - ^Xk) is achievable with strict 
delay for session s if 3q{y\s) with anonymity a such that 

1. Every realization of y given s is a valid network 
schedule. 

2. For every realization of y, there exists a valid relaying 
strategy Z that satisfies 



A(Z,P,) > A,;, Vi. 



(4) 



For a large network with several possible session vectors, 
characterization of the set of rates for each path of each 
session vector is potentially cumbersome. Furthermore, in 
order to draw useful inferences on the relationship between 
anonymity and network performance, it is helpful to have 
a simpler quantity representing the achievable performance. 
We, therefore, propose a scalar metric to characterize the 
performance of large networks, defined by the average sum- 
rate as follows. 



where the expectation is over the joint pdf of y and S. 

Note that the rate and sum-rate defined only represent the 
rate of packets successfully relayed from sources to destina- 
tions. Since the relaying strategy could result in packet drops 
en route to the destinations, the reliability of the achievable 
rates needs to be proved by specifying packet encoding and 
decoding techniques. We address this issue using forward error 
correction in Section IIII-DI 

The fundamental design problem considered in this paper is 
to characterize the set of achievable rates with anonymity a. 
Specifically, we derive achievability results for two scenarios: 
For the two hop network (as shown in Fig. ID), we characterize 
the set of achievable rate vectors with maximum anonymity 
{a — 1) under both delay constraints. For a general network, 
we use the results from the two hop network and characterize 
the weakly achievable sum-rate for a general a. 

III. Anonymous Multiaccess Communication 

In this section, we characterize the set of achievable 
relay rates with maximum anonymity for the two- 
hop network as shown in Fig. |4] In particular, 
we provide rate regions for the session vector 

S,„ = {(5i,P,i?i),(52,P,i?2),--- ,(^m,S,An)}, 

i.e. the sources 5*1,--- ,Sm transmit packets to destinations 
Di, - ■ ■ , Dm through relay B. 



Si 






B 



Fig. 4: Two Hop Network: Source Si transmits packets to 
Di through B 



A. Independent Scheduling 

In accordance with the definition in Section Hl-BI schedul- 
ing with perfect anonymity corresponds to the independence 
between session vector S and the transmission schedules y or 
in other words, 

H{s\y) = H{s) ^s±y. 

We, therefore, propose an independent scheduling tech- 
nique, wherein each node in the network generates a random 
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transmission schedule, statistically independent of the session 
and the schedules of other nodes in the network. For example, 
in the network shown in Fig. |4] with rn = 2 

q{y\s) = qiiysMysMys), 

where the distributions qi do not depend on S. 

Independent scheduling is a particular solution to main- 
taining anonymity in the two hop setup. An alternative to 
independent scheduling would be the fixed scheduling as 
described in [16]. Under that model, all the nodes follow a 
fixed synchronous schedule irrespective of transmitted data 
rates or paths of information flow. While the fixed scheduling 
strategy guarantees maximum anonymity, it would result in 
a large percentage of dummy packets for low traffic loads. 
Further, a fixed schedule requires a centralized synchronous 
implementation, which is impractical in large networks. 

The relaying algorithms discussed in this section are not 
specific to the statistics of the particular transmission processes 
and some of the optimal properties hold for any pair of 
point processes. However, for the purpose of analytical char- 
acterization of relay rates, we have modeled the transmission 
schedules to belong to independent Poisson point processes. 
Poisson processes have typically been used to model the 
arrival of packets to nodes in a network, due to memoryless 
interarrival times property. Although Poisson schedules cannot 
be shown to be optimal under strict delay constraints, under 
certain conditions on the physical layer, they are shown to 
be optimal for an average delay constraint. Our relaying 
algorithms can be used on other point processes, such as Pareto 
distributed schedules, however the analytical tractability is not 
guaranteed. 



B. Scheduling under Strict Delay 

Consider the special case of a single source relay (Fig. |4] 
m = 1). We are interested in the achievable relay rate for the 
session si = {{Si,B^Di)}. The medium access constraints 
are specified by the bounds Tsi < Cs^ , Tb < Cb on the 
transmission rates. If the delay constraint was absent (A = 
oo), then each received packet can be relayed by B at the next 
available epoch in its transmission schedule. Since packets can 
be held for an indefinitely long time, the achievable relay rate 
would be A(Z, S, Di)) = min{Csi , C_b}. Note that this 
is also the maximum possible rate if node B were to relay 
packets without any anonymity requirement. 



Dropped 




Incoming 



Outgoing 



Fig. 5: 



Dummy 

Bounded Greedy Match: Unmatched packets are 
dropped, unused epochs have dummy packets 



When a strict delay constraint of A is imposed, we design 
the relaying strategy using the Bounded Greedy Match (BGM) 
algorithm proposed in [23] under the context of chaff insertion 
in stepping stone attacks. The algorithm (Fig. |5]l is described 
in Table I. The basic idea is as follows: When a packet arrives 
at B, if there exists a departure epoch within A of the arrival 
instant and has not been matched to any previous arrival, it 
is assigned to the arrived packet. Otherwise, the packet is 
dropped. If a relay epoch does not have any packet assigned 
to it, the relay transmits a dummy packet at that epoch. 



Let Is J (n), YB{n) represent the arrival time of the n*'' packet from Si 
and departure time of n*'' packet from B. 

1. Initialize i = 1, j = 1. 

2. Lett = min{ysi(j),Ys(i)}. 

3. If t = YbH), then 

i. B transmits a dummy packet at time XsCi)- 

ii- J = i + 1- 
else ifyi3(j)-y5i(i)< A 

i. B transmits the i"" packet from Si at YbH)- 
\\. i = i + 1, j = j + 1. 
else 

i. Drop the j*'^ packet that arrived from S\ . 

ii. i = i + 1. 

4. Repeat Step 2,3 until the end of the streams. 



TABLE I: Bounded Greedy Match Algorithm 

It was shown in [23] that this greedy algorithm resulted 
in least packet drops. Based on the algorithm, the following 
theorem characterizes the best achievable relay rate for a pair 
of independent Poisson processes. 

Theorem 1: If the nodes Si and B generate independent 
Poisson transmission schedules, the maximum achievable relay 
rate from 5*1 to Di through B is given by A(Z, {Si , B, Di)) = 
Csi(l-e(5i,B)) where 

Cb —Csi 

eiSi,B) = 



Csi / Cb 
Csi = Cb 



A 



fe{Cst,CB)- 



(6) 



Proof: Refer to Appendix. 

Theorem[T]expresses the maximum achievable rate in terms 
of the loss function e(S'i,B) where e{Si,B) represents the 
fraction of packets dropped at relay B. As the delay constraint 
A increases, it is easy to see that the relay rate converges to 
mm{Csi ■ Cb} which is the optimal rate under no anonymity 
requirement. Furthermore, the convergence of the relay rate to 
the optimal value is exponential in A. The value of e(S'i, B) 
given in Theorem [T] is obtained when Si uses the maximum 
transmission rate of Csi for this particular route. In a general 
network, 5*1 could be simultaneously transmitting to another 
node, in which case, the rate allocated for ysi,B would 
be strictly less than Cs^. In such a situation, by replacing 
Csi in © with the allocated rate for the particular flow, 
we can use Theorem[T]to evaluate the corresponding relay rate. 

TO X 1 Relay: For the general to x 1 relay as shown in Fig. |4] 
in the absence of the anonymity constraint, the achievable rate 
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region can be obtained using the medium access constraints: 

A(s,„) = {(Ai, • • • , A^) : A, < Cs. Vi, ^ A, < Cb}- (7) 

i 

For a finite delay constraint, a trivial achievable rate region 
can be obtained if the relay ignores the originating source 
of the arriving packets. Specifically, the relay uses the BGM 
algorithm on the joint incoming schedule IJJ^Si.B and the 
generated outgoing schedule 3^^. For this strategy, the single 
source result in Theorem 1 can be easily extended to char- 
acterize an achievable rate region for s^, which is given in 
Corollary [T] 

Corollary 1: There exists a relaying strategy for a m x 1 
relay such that the achievable rates A(sm) — (Ai,-- - ,Am) 
satisfy A^ = - e{St,B)),yi where 



e{S^,B) > /e(^T„CB),Vz 



(8) 
(9) 



Prioritized Scheduling Ignoring the source identities and 
considering the joint stream is strictly sub-optimal. Since the 
relay observes a distinct stream from each source node (by 
virtue of transmitter directed signaling), the streams can be 
prioritized to obtain a larger achievable rate region compared 
to Corollary [T] 

Consider a 2 x 1 relay and assign the highest priority to 
Si. For every departure epoch in 3^^, the relay considers 
all packets that have arrived within A time units before that 
epoch. If some of those packets arrived from Si (highest 
priority), the relay transmits the earliest of those packets at the 
chosen epoch. If none of the packets arrived from 5*1, then the 
packet that arrived first (from 5*2) is transmitted. Since Si is 
given highest priority, this would provide the maximum rate 
achievable for the stream from Si. The priority algorithm is 
formally described in Table II. 



1. Initialize i = 1, j = 1, fc = 1. 

2. Ifyfl(j)-2si,s(i)> A 

i. Drop packet from S\. 

ii. j = i + 1. Repeat Step 2. 

3. Let t = m\n{Zs^,B{i).YB{3)}- 
A.lft = YB(j) 

i. Let t' = min{Zs2,s(j), Vs(fc)}. 

ii. If t' = Ysik) then B transmit dummy packet alt' . k = k + 
else if Zs2,s(i) > YB(fc) - A 

B transmits j'*'' packet from 82- j = j + 1, k = k -\- 1. 

else 

j = j + 1. Repeat Step 4.ii. 

else 

B transmits i"" packet from Si. i = i + 1, k = k + 1. 
5. Repeat Steps 2-4 until end of streams. 



TABLE II: Priority Mapping Algorithm: 5*1 highest priority 

Similarly, by interchanging the priorities, we can obtain 
the maximum rate for the stream from 5*2. It is easy to 
see that, when none of the sources are given priority, it is 
equivalent to ignoring the origin of packets (Corollary [T]i. By 
time-sharing multiple relaying strategies with different priority 



requirements, a piece-wise linear region of achievable rate 
vectors is obtained, which is characterized in Theorem |2] 

Theorem 2: If A(s2) = (Ai,A2) represents the achievable 
relay rates for sources Si and 5*2 through relay B, then 
1. (Ai,A2) is achievable if 

Ai<aiA2 + &i, A2<a2Ai+&2, A, < Cs, (1 - /e(C5,, Cs)), (10) 
where j and 



Cb[(1 + A(C_b-Cs, -Cs )-!] 



A(Cd-Cs. -Co . ) 



Cs . (C 



A(C_E 



-Cs.) 



2. (Ai, A2) is not achievable if 

51 A,; > (Cs,+C5j(l-/e(Cs,+Cs,,CB)), A, > C sX^~ ] .{C s^.C b) 

(13) 

Proof: Refer to Appendix. 

The priority scheduling cannot be proven to obtain the 
optimal achievable rate region, and so Theorem|2]also provides 
an outer bound to determine the extent of possible sub- 
optimality. The outer bound is an upper bound on the sum 
rate Ai + A2 that is obtained using the optimality of the BGM 
algorithm. It can be shown that as A ^ 00, the inner and outer 
bounds coincide and converge exponentially fast. Although the 
optimality of the region for Poisson processes is still an open 
problem, the strategy achieves the maximum possible sum- 
rate. 

The prioritized scheduling can be extended to a general m x 
1 relay. Every priority assignment corresponds to an ordering 
of the sources. When packets from multiple sources contend 
for a single epoch, the choice of packet to relay is made 
according to the ordering. Further, by time-sharing strategies 
for different priority assignments, the complete region can be 
obtained. 



Achievable Rate Region : C =3, C =4,C 




Fig. 6: 2 X 1 Relay rate region. Ri is the rate 

A(-Z, (Si., B, Di)). The inner and outer bounds 
coincide at the maximal sum-rate point. 

An example region for the 2x1 relay is shown in Fig. |6] 
As is evident, the time-sharing strategy results in a piece-wise 
linear and convex region. The two corner points of the polygon 
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in the figure which represent the achievable rate-pairs when 
S2^ Si are respectively given full priority, clearly demonstrate 
the gains due to prioritized scheduling. Even when 5*1 is given 
full priority, the relay rate for 5*2 is strictly positive. If no 
priority is used, however, S'l can achieve maximum rate only 
when 5*2 does not transmit at all (region of Corollary [T]i. 
The maximum priority rate-pairs can also be viewed as the 
outcome of successive application of the BGM algorithm on 
the incoming streams from the two sources, with the order of 
application determined from the priority assignment. 

From theorems [T] and |2] it is clear that when C5. ,Cb 
and A are finite, the relay rates are strictly less than the 
transmission rates, thereby resulting in a non-zero packet 
drop rate. Therefore, the source needs to employ forward 
error correction (FEC) in order to deliver information to the 
destination reliably. It can be shown that for very long streams, 
the coding does not result in further rate reduction (see Section 

hifdT i. 



a different strict delay constraint (A*) in each segment (see 
Fig. |7|l- The strict delay constraints should be chosen such 
that the average delay ^'™^'^'jv^^^ '^^^ is less than A. As the 
length of the stream increases, each segment i would provide 
an achievable relay rate A* = Cs,{l — fdA* ,Cs^,Cb)) 
(Theorem and the net achievable rate would be ^^j^- 
However, for a pair of Poisson processes, it can be shown that 
A* is a convex function of the strict delay A*, and hence, this 
segmentation does not reduc^l packet loss for a fixed average 
delay. 
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Fig. 7: Delay Segmentation: In each segment of the traffic, a 
different strict delay A* is chosen. 



C. Average Delay 

In this section, we consider the average delay constraint 
at a relay, as specified by (|2]l and (|3]l. It is easy to see 
that achievable rate regions for an average delay constraint 
of A can be trivially obtained by using the algorithms of 
Section IIII-AI that assume a strict delay of A. This trivial 
strategy, however, can be significantly improved by modifying 
the algorithms appropriately. 

Consider the single source relay. Let m{A, Csn Cb) repre- 
sent the mean packet delay obtained when the BGM algorithm 
is applied with strict delay constraint A. Since we consider 
infinitely long streams with an asymptotic constraint, we can 
choose a strict delay constraint A* such that the mean delay 
m(A*,C5i,B) = A. 

Theorem 3: X{Z, {Si, B, Di j) = CsA^ - <Si,B j) i_s an 
achievable relay rate for an average delay constraint of A if 

MA*,Cs,,Cb) CB^Cs,<i 



e{Si,B) > 







and A* is the solution to m{A* ,Csi,Cb) = A where 
m(A ,Csi,6b) ^ 



{CB^CsJ[l-e^'(^s,-^-^ ■ 
Proof: Refer to Appendix 

For values of A close to zero, the strict delay constraint 
A* « 2A. Therefore, for very small delays, an average 
delay constraint does not provide significant improvement in 
achievable rate compared to a strict delay constraint. However, 
as A increases beyond a certain threshold, the equivalent 
strict delay A* increases exponentially. In that regime, an 
achievable rate close to optimal can be obtained even for a 
bounded A. Furthermore, as is evident from the Theorem, 
when Cb — Cs^ > the strategy achieves zero packet 
loss. In other words, every transmitted packet can be relayed 
successfully within the (average) delay constraint. 

Since we consider long streams, this strategy could poten- 
tially be improved by dividing the stream into finite number 
(N) of segments, and implementing the BGM algorithm with 



Using the relation between the strict delay and average delay 
in Theorem [3] the achievable region for the m x 1 relay 
can also be obtained by appropriately modifying the strict 
delay constraint in the prioritized scheduling. The condition on 
transmission rates for which the priority scheduling strategy is 
optimal for the m x 1 relay case is a straightforward extension 
of Theorem [3] 

Corollary 2: There exists a scheduling strategy for average 
delay A that incurs zero packet loss on all incoming streams, 
if the medium access constraints satisfy: 



Cb-J2Cs.>^. 



From the results presented so far, it is clear that while 
independent Poisson scheduling generally provides a subset 
of achievable relay rates for strict delay constraints, under 
certain conditions on the medium access, it can be optimal 
for an average delay constraint. An important feature in 
the algorithms presented is that the relays do not require 
prior knowledge about transmission schedules of the source 
nodes. The decision to transmit any packet is based on events 
occurring between its arrival time and the subsequent departure 
epoch. This makes it particularly attractive for a decentralized 
implementation of the scheduling, which is of particular value 
in adhoc wireless and sensor networks. Note that although 
the rate expressions derived are for Poisson processes, the 
algorithms presented are quite general, and can be used on 
any set of point processes. Furthermore, the optimality of the 
BGM algorithm also holds for any pair of point processes. 

D. Reliability 

The independent schedules and relaying algorithms dis- 
cussed previously result in strictly non-zero packet drop rate 
for Poisson processes. Further, since the relay nodes generate 
schedules in a decentralized manner, it is not possible for the 
source node to know the identities of packets that would be 

tThis convexity may not hold for non-Poisson schedules, in which case, 
the segmentation could potentially increase the achievable relay rate. 
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dropped. This implies that the source nodes must employ for- 
ward error correction (FEC) techniques to transmit information 
reliably to the destination. When the traffic is time sensitive 
such as in media transmission, FEC may not be practical, as 
it would incur significant coding delay. However, if the strict 
delay constraint is enforced due to low duty cycles (as in 
sensor networks) or to maintain stability, it is useful to employ 
coding to ensure reliability of transmission. 

In order to analyze the reliability of packet transmissions, 
it is necessary to characterize the channel model between a 
source and destination. For this purpose, if we treat each 
packet as a binary unit of data, then the packet drops can 
be equated to a binary erasure channel. Since packets can be 
appended with indices, the erasure positions would be known 
at the destination node. 

Consider a relay node forwarding packets from a single 
source. Let E{i) denote the random variable indicating that 
packet i was successfully relayed when applying the BGM 
relay algorithm. Then, using Proposition 4 in [24], it can be 
shown that the relay rate obtained from Theorem [T] can be 
achieved reliably. 

Lemma 1: The capacity C of the erasure channel for a 
single source relay after applying the BGM algorithm is 

C = 1 -limsup„i^£;(i) = l-e(S'i,B), 

where e(S'i, B) is given by (|6]l. 
Proof: Refer to Appendix. 

The achievability of this reliable rate, however, requires 
coding across a long stream of packets. Since prioritized 
scheduling is equivalent to successive application of the BGM 
algorithm, the rate region of Theorem|2]also represent reliable 
rates. In practice, a packet is not a unit of data and the 
FEC is different from regular point to point communication 
channels. Coding for packet recovery in networks has been 
addressed in literature [25], [26]. In particular, in [25], the 
authors propose coding schemes, where, for every block of 
information packets, parity packets are transmitted such that 
Vi, the ith bit from each packet arranged in sequence forms a 
codeword from an erasure correcting codebook. 

IV. Sum-Rate Secrecy Region 

The achievability results presented in the previous section 
can be viewed as the basic building blocks for hiding routes 
in a network. While the independent scheduling idea can 
be directly extended to multihop routes, characterizing rate 
regions for large networks is cumbersome and not practical. 
Furthermore, Theorem 2 in [27] shows that under certain 
conditions, for an n— hop path with independent Poisson 
schedules, the maximum rate of packets that can be relayed 
to the destination with strict delay constraint decays exponen- 
tially as n increases. Therefore, instead of directly extending 
the idea, we propose to utilize independent scheduling at 
selected portions of the network depending on the required 
level of anonymity a. 

As an example, consider the switching network shown in 
Fig. [8] During any network session, each source Si picks a 




Fig. 8: Switching Network: Sources {5,} transmit packets to 
destinations {Di} through relays {Mi\. 

distinct destination Dj. It is easy to see that given the Si, Dj 
pairings, there is a unique set of paths in the session S. If no 
anonymity is required, each relay would transmit a received 
packet after a negligible processing delay, thereby incurring 
no packet drops. Assuming each node has a transmission rate 
of C, the average sum-rate achievable would be 2C (min- 
cut would be out of MijMa). Since the schedules of all the 
relays are dependent on the arrival processes, the eavesdropper 
would be able to detect the relaying operation of the nodes 
Ml, • • ■ , M4. However, since nodes utilize transmitter directed 
signaling with encrypted headers, the eavesdropper would not 
be able to ascertain the final destination nodes of any path. In 
this case, it can be shown that the anonymity level '^^g'^-' = 
.436. 

On the other hand, complete independent scheduling would 
imply that the relays Mi, ■ • ■ , AI4 generate statistically inde- 
pendent schedules. Such a strategy would provide maximum 
anonymity a = 1, but result in a reduced achievable sum-rate 
given by 2C(1 — ei)(l — £2), where £1,62 are packet losses 
incurred at relays Mi,M3 and M2, Af4 respectively. 

Suppose, only Mi , M3 were to generate independent sched- 
ules, while M2 , M4 relayed packets immediately, the eaves- 
dropper would be able to observe a portion of the paths. In that 
case, it can be shown that the anonymity level ^1^^^ — -65 
(refer to Appendix for details). However since only one relay 
in each path drops packets, the achievable sum-rate, however, 
increases to 2C(1 — ei). 

This simple example illustrates the trade-off between 
achievable network performance and the level of anonymity. 
In the remainder of this section, we shall formalize these 
ideas, describe a randomized relaying strategy and provide 
an analytical characterization of the achievable sum-rate as a 
function of anonymity. 

A. Relay Categories 

As suggested in the example, the key idea we exploit is to 
divide the set of relays according to their scheduling strategies. 
Specifically, we categorize the relays into two types: covert 
and visible relays. 

Covert Relays: A relay M is covert, if it generates a trans- 
mission schedule statistically independent of the schedules of 
all nodes occurring previously in the paths that contain M. 
For example, if only path P — {Ai, ■ ■ ■ , Af^, M, A^+i, • ■ • } 
contains M, then M is covert if its transmission schedule is 
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Uncorrelated 



Correlated 



schedules y and the relayed subsequences Z can be generated 
for all nodes in the network. 



Covert Visible 
Fig. 9: Visible and Covert Relaying. 



independent of schedules of Ai, • • • , Ak- Further, if M relays 
packets from k nodes, then it uses the BGM algorithm on the 
joint incoming packet stream to optimally match the departure 
epochs. Since our criterion is to maximize sum-rate, the nodes 
are given equal priority which is the sum-rate optimal strategy 
(Theorem lU. 

Visible Relays: A visible relay M generates its schedule 
based on the schedules of nodes transmitting packets to M. 
For every received packet, the relay schedules an epoch after a 
processing delay (negligible compared to A). It is evident that 
a relay operating under this highly correlated schedule would 
be easily detected by an eavesdropper It is important to note 
that, although some received packets from the transmitting 
node may be dummy packets, these are also relayed by 
a visible node. The reason is that, if dummy packets that 
were generated due to independent scheduling at a previous 
node were to be dropped by the visible relay, then the new 
stream would no longer be independent from the node two 
hops earlier (see Fig. [TOl i. We assume that for visible relays, 
the eavesdropper makes a perfect detection of the relaying 
operation. 



yi 



y2 



Covert Relay Visible Relay 
Fig. 10: Relaying Dummy Packets: J^i and are 

statistically independent. If the dummy packets 
(represented in green) are not relayed, the processes 
3^1 and will be dependent. 

By appropriately selecting which relays should be covert in 
a session, we can guarantee the required level of anonymity. 
A trivial strategy would be to let all nodes act as covert relays 
in a session. However, since the independent schedules would 
result in packet loss at every covert relay, network throughput 
would be reduced significantly. It is, therefore, necessary 
to pick the covert relays optimally so that anonymity is 
guaranteed with minimum loss in throughput. 

We assume the transmission times of packets by each source 
node in a session are generated according to an independent 
Poisson process. To maintain uniformity in traffic schedule 
patterns, we let the covert relays also generate independent 
Poisson processes. Given a session S, let B represent the set 
of relay nodes that are chosen to be covert. Given S, B, using 
the relaying algorithms discussed in the previous section, the 



B. Eavesdropper Observation 

We assume that when a relay is visible, the eavesdropper 
perfectly correlates the schedules transmitted by a preceding 
node and the relay. As a result, depending on the set of 
visible relays, the eavesdropper makes a partial detection on 
the paths of a session. We denote this partial observation 
as a set of paths, S £ 2^*^^-'. Given the observation S, the 
eavesdropper would try and infer the actual session S. The 
partial observation S can be expressed as a function of the 
actual session S and the set of covert relays B. 

We define function t : 2^(^) x V ^ 2''^^^'^ to characterize 
the eavesdropper's observation when at most one relay is 
covert. For a set of paths P, t(P, B) contains the observed 
paths when only node B is covert. If B ~ (f>, then t(P, (f>) 
is obtained by removing the destination nodes from every 
path in P. This is because, even if all relays are visible, 
transmitter directed signaling ensures that it is not possible 
to detect the final destination in any route. If B ^ (p, then a 
path P G P{G) belongs to <(P, B) if and only if it satisfies 
one of the following conditions: 

1. 3P' = (Ai,-- - ,Ak,B,Ak+i,--- eP, such that 
P= (Ai,-- - ,Ak) or P ={B,Au+i,--- 

2. P e P and P ^ P. 

Condition 1 states that, when a path in P contains a covert 
relay, the eavesdropper would observe two different paths, one 
terminating before B and the other originating from node B. 
Condition 2 states that a path that does not contain a covert 
relay is fully observed. When a subset B = (Pi, • • • , P„j) C 
V of relays are covert, then S can be obtained by repeated 
application of t: 



t{---{t{t{S,c^),Bi) 



A 



T(S,B). 



(14) 



It can be shown that the set S in the above equation, 
represents the eavesdropper's sufficient statistic (part of the 
proof of Theorem |4]i. 

C. Throughput Function 

In order to design the optimal selection strategy, we first 
characterize the loss in sum-rate when a deterministic set of 
relays are covert in a session. The relaying strategies in Section 
IIII-AI were designed to minimize the packet loss at a single 
covert relay. Extending those results to multihop routes, we 
can characterize the loss in sum-rate of each session S, when 
a subset of relays B are covert. 

If we ignore the anonymity requirement, the best throughput 
in the network is achieved when all relays are visible. Each 
session S corresponds to a maximum achievable sum-rate 
obtained using the max-flow that satisfies medium access 
constraints. Specifically, let A (S) = (A^, • • ■ , A|'g|) represent 
the vector of achievable relay rates for the paths in session S 
with no covert relays, and A"(S) be the maximum achievable 
sum-rate. 
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If S = (Pi, ■ • • , P\s\)^ then, using the forwarding strategy 
for visible relays, the maximum achievable sum-rate is the 
solution to: 

A"(S) = max(A5; + --- + A;j), (15) 

J2 ^ Cb, e y. (16) 

i-.BePi 

Therefore our performance metric when anonymity a = 
is the maximum expected sum-rate given by, 

i?(a = 0) =E(A''(S)), 

where the expectation is over the prior p{S). Although in 
practice, the actual rates of flows are dependent on the nature 
of data and network application, the maximum sum-rate is 
a metric that represents the fundamental limits of achievable 
performance. 

When a subset of relays are covert, the achievable sum- 
rate in each session is reduced depending on the fraction of 
packets dropped at each covert relay. The net relay rate for 
each path is obtained by multiplying the fraction of packets 
that are relayed at every covert relay in that path. 

Specifically, let A'^(S,B) = (AJ,-- - ,A|g|) represent the 
achievable relay rates from sources to destinations for a session 
S = (Pi,- - ,P|s|)' when nodes in B are covert, and let 

A'=(S,B) = EE'i-^^ be the achievable sum-rate. If A{i,j) 
represents the j*^ node in path Pi, then 

A? = n (l-e,(A(z,j-l),^(z,j)))(17) 

i:A(ij)GBnP. 

where ei{A,B) represents the fraction of packets transmitted 
by node A on path Pi, that are dropped by covert relay B. Note 
that Theorems [T] and |2] provide the closed form expression for 
ei{A, B), if B is the first covert relay in the path i. Since the 
departure epochs of data packets from a covert relay do not 
constitute a Poisson process, the expression cannot be applied 
to subsequent covert relays. The analytical characterization of 
multiple covert relays is generally cumbersome, but can be 
obtained numerically. 

Although the solution of the optimization in (([Tst.dTSIl) 
specifies a set of transmission rates for the nodes, we know 
from Theorems [T] and |2] that, increasing the transmission 
rates of nodes results in lower packet losses for statistically 
independent schedules. Therefore, if the relay immediately 
following a source node is covert, the source node could 
transmit at the maximum rate possible to minimize packet 
losses. In other words, if A is a source node, then Ta = 
Tlii-AeP Ai' can be increased to Ca- Since only the source is 
allowed to perform forward error correction, it does not help to 
increase transmission rates of subsequent relays (as we would 
only get additional dummy packets). 

V. Performance Characterization 

With the eavesdropper observation of ( fT4] l and throughput 
characterization in ( fTTb . we now have all the elements required 
to maximize throughput with anonymity a. Prior to describ- 
ing the general randomized strategy, to ease understanding, 
we first discuss a simple deterministic strategy to obtain a 



smaller region of achievable sum-rate anonymity pairs. Then, 
expanding on that idea, we provide the generalized strategy to 
characterize the sum-rate anonymity region. 

Deterministic Covert Scheduling: A direct optimization of 
( fTTb provides a deterministic strategy to characterize achiev- 
able sum-rates under anonymity constraints. Specifically, a 
subset B of relays is chosen to remain covert for all sessions, 
such that the sum-rate is maximized without violating the 
anonymity requirement. 

Theorem 4: A sum-rate R is achievable with anonymity a 

if 

R< max E[A'=(S,B)], 

B:H(S|S)>Q 

where S = T(S,B). 

Proof: Refer to Appendix 

Depending on the level of anonymity required, the strategy 
picks one subset of nodes that are always covert (for all 
sessions). Since the number of possible subsets is finite, 
the achievable sum-rate anonymity region would be constant 
within intervals of a, with sudden jumps corresponding to a 
change in the optimal subset (see example in Section IVll i. 

The above theorem provides one set of achievable sum-rates 
as a function of anonymity a. As mentioned in Section Hl-BI 
equivocation is an average metric. It gives a lower bound on 
the average probability of error for the adversary. Furthermore, 
the performance is also measured by an average sum-rate 
metric. Therefore, by time-sharing multiple strategies, it is 
possible to obtain a convex region without violating the 
anonymity constraint. 

For example, let two subsets of covert relays Bi and B2 
correspond to achievable sum-rate anonymity pairs Pi,ai 
and R2,a2- At the beginning of every session, one of the 
subsets Bi,B2 are chosen with probability i. Then, it is 
possible to obtain an achievable sum-rate anonymity pair 
(^Ri±E2.^ ai+a2 y general, any convex combination of sum- 
rate anonymity pairs is achievable by time-sharing. 

Corollary 3: Let 

n"^^^ = {iR,a):R is an achievable sum-rate with anonymity 

Then, every (P, a) G convex-hull(7?.'^^'^) is achievable. 

Randomized Covert Scheduling: The drawback in the strate- 
gies discussed above is that the subset B is chosen independent 
of the session S. The generalized strategy is to chose the set 
of covert relays as a random function of the session S. We 
model the set of covert relays B as a random variable with 
a conditional probability mass function {(/(BjS) : B e 2^}. 
The goal is to optimize the conditional p.m.f {g(B|S)} so 
that achievable sum-rate is maximized for a given level of 
anonymity a. Obtaining the best distribution could typically be 
done using a brute force optimization over a large dimensional 
simplex, which is computationally intensive, and impractical 
for large networks. However, the following result proves the 
duality of this problem to information theoretic rate-distortion, 
which can then be used to efficiently obtain the optimal 
strategy and characterize the optimal sum-rate R{a). 



SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY 



12 



Theorem 5: Let d : 2^ x 2^^ 7^ s.t. 



d(S,S) 



A"(S) - A=(S,B) 3B s.t. S = T(S,B) 



o.w. 



(18) 



Then, a sum-rate R{a) is achievable with anonymity a if 

R{Q) - R{a) > D (i?(S)(l - a)) , 



where D{r) is the Distortion-Rate function defined as 
D{r)= min E(d(S,S)). 

g(S|S):/(S:S)<r 

Proof: Refer to Appendix. 



(19) 



The above theorem provides R{a) using the single letter 
characterization of a rate-distortion function. The loss function 
(i(S;S) represents the reduction in sum-rate due to covert 
relaying. Although the loss function parameters do not ex- 
plicitly include the set of covert relays B, it can be shown 
that given S, S, the set of covert relays B is unique (see 
proof of Theorem m. Therefore, the distribution g(B|S) to 
chose covert relays is equivalent to the distortion minimizing 
distribution in ( fT9] ). As a result, the Blahut-Arimoto algorithm 
[28] provides an efficient iterative technique to obtain (7(B|S) 
and the achievable sum-rate R{a). Note that the anonymity a B 
is guaranteed assuming that the eavesdropper is aware of the DC 
network topology, the session prior distribution p(S) and the E 
optimal strategy (7(B|S) of choosing covert relays. cD 

A. Discussion 

The equivalence between anonymous networking and rate 
distortion is not tied to our strategy of choosing covert relays, 
as explained in Section II-AI In our model, the level of 
anonymity a directly corresponds to the rate of compression 
and the performance loss function plays the role of distortion. 
Therefore, obtaining the optimal rate-distortion function is 
equivalent to obtaining the throughput anonymity relation. 

We believe that the consequences of this duality extend 
beyond the characterization of the tradeoff between anonymity 
and throughput. Rate distortion is a field that has been studied 
for many decades [21], and the numerous models and tech- 
niques developed therein could serve to design strategies for 
anonymous networking. For example, in our setup, the Blahut- 
Arimoto algorithm provides an efficient iterative technique to 
obtain the optimal distribution of covert relays in a session. 

In our current setup, we have considered independent ses- 
sions of observation, which may not apply to the scenario 
where an eavesdropper monitors the network for long periods 
of time. In that case, we would need a stochastic model to 
account for session changes, depending on when nodes start 
or stop communication. Based on the duality we believe that, 
if we adopt a Markovian model for the session evolution, then 
techniques in causal source coding [29] would provide possible 
solutions. 

We currently model the entire session as a single entity 
(the variable S) which may not be practical to analyze in a 
large scale network. This model should be broken down to 
protecting each route independently, depending on the level 
of anonymity required by that particular route. One approach 



towards such a model would be to express the set of routes as 
sequence of links, rather than a single session variable. Each 
session would then be correspond to a source sequence, and 
the distortion measure would depend on the relative levels of 
anonymity required by routes. The challenge in developing 
such a model, however, is to account for eavesdroppers cor- 
relating schedules across multiple hops. 

VI. Example 

Consider the switching example given in the beginning of 
Section HV] (Fig. [8]l. During any network session, each source 
Si picks a distinct destination Di. The set of sessions S, 
contains 24 elements which are assumed equiprobable. For this 
example. Fig. [TT] plots the sum-rate anonymity region for the 
deterministic and probabilistic strategies discussed previously. 
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Fig. 11: Sum-Rate Anonymity Region for 4x4 switching 
network with C — 2. 

The sum-rate anonymity relationship is convex as seen in 
the figure. This is because the performance metrics, namely 
anonymity and throughput, are average quantities, which al- 
lows time-sharing to convexify any set of achievable rates. The 
figure clearly demonstrates the performance improvement due 
to the randomized covert scheduling. As can be seen, when 
all relays are visible, the maximum sum-rate 2C is achieved 
with a strictly positive secrecy level. This is because, given the 
transmission stream from relay M2 (or A/4), it is not possible 
for the eavesdropper to detect which packets are received by 
each destination node. Another interesting observation is that 
it suffices to make relays covert in order to obtain 

perfect anonymity. This shows that, although making all relays 
covert ensures perfect secrecy, it may not be necessary. 

VII. Conclusions 

One of our key contributions in this work is the theoretical 
model for anonymity against traffic analysis. To the best of 
our knowledge, this is the first analytical metric designed to 
measure the secrecy of routes in an eavesdropped wireless 
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network. Based on the metric, we designed scheduling and 
relaying strategies to maximize network performance with a 
guaranteed level of anonymity. Although we consider specific 
constraints on delay and bandwidth, the ideas of covert re- 
laying and the randomized selection are quite general, and 
apply to arbitrary multihop wireless networks. The throughput- 
anonymity tradeoff we obtain reiterates the known paradigm of 
inverse relationship between communication rate and secrecy 
in covert channels. 

In this work, we used throughput as an indicator of net- 
work performance and optimized the selection strategy. How- 
ever, the framework we establish extends beyond maximizing 
throughput. In fact, the loss function we define in ( fTSl l can 
be redefined to represent the loss in any convex function of 
the achievable relay rates. Further, instead of fixing the packet 
delay and minimizing the loss in sum-rate, we could fix the 
rates of transmission and analyze the increase in latency at 
every covert relay. By optimally designing the loss function 
to reflect the increase in overall network latency, we would 
be able to derive the relationship between latency and level of 
anonymity. 

Appendix 

Proof of Theorem Q] 

To prove the theorem, we adopt the technique used in [20]. 
Consider the two point processes 3^Si , 3^5 ■ Let Xj be the jxh 
packet delay, i.e. Xj = Ysij) — Isi(j)- Define 

z, = = {YsA.i)-Yb{j-i))-{YsAj)-Yb{j-i)). 

We see that Z/s are i.i.d. random variables; each Zj is 
the difference between two independent exponential random 
variables with mean 1/Cb and l/Cs^, respectively. The 
process {Xj}°^^ is a general random walk with step Zj. 
Define Xq = 0. 

Now for every dummy packet transmitted at t in ys, we 
insert a virtual packet at t in ys^', for every packet dropped 
at time s in ysi, we insert a virtual packet at s + A in 3^b- 
Let the new packet delays after the insertion of virtual packets 
be {Xj}°^Q. It can be shown that {Xj}°^Q is also a random 
walk with step Zj, but it has two absorbing barriers at and 
A, i.e. 

X!j = min(max(Xj_i + Zj, 0), A). 

Since it is almost surely impossible for Xj_^ + Zj to be 
exactly equal to or A, each time Xj = corresponds to 
a dummy transmission in J's, and Xj = A corresponds to a 
dropped packet in ys\. From example 2.16 in [30], we know 
that the probability of Xj = A is given by 



Pr{X' = A} = 



1 - 



Tb 



Tb_p-A{Ts,-Tb) _ ISl. 



= Pr{X' = 0}. 



Tsi " Tb 

Therefore, the fraction of dropped packets in 3^^^ is 
Pt{XI = A} _ Tb- Ts, 



(1 - Pr{X; = 0}) ,Bt 

By replacing the transmission rates Tj^ , Tb with the max- 
imum values Csi.Cs, the theorem is proved. In [23], the 



authors have shown that the BGM algorithm inserts the least 
chaff fraction for any pair of point processes. Hence, for any 
{Tsi , Tb), it is impossible to obtain a higher information relay 
rate than This procedure can be extended to multihop by 
considering multidimensional random walk, but closed form 
evaluation of the relay rates is cumbersome, even for a few 
hops. 

Proof of Theorem |2] 

2. The outer bound is obtained using the optimality of BGM 
algorithm. Let node Si transmit at rates Csi- Then, the sum 
information relay rate obtained by using the BGM algorithm 
on the joint incoming process is given by: 



(20) 



Since BGM inserts the least fraction of dummy packets[23], 
this is the maximum sum-rate achievable for the given trans- 
mission rates. For each individual source Si, the best rate 
possible is obtained if the other source is completely ignored. 
Therefore, by replacing Gsj by Csi in d20] i. we can obtain 
the remaining conditions that specify the outer bound. □ 

1. Let the zero priority region of Corollary [T] be represented 
by TZq. Every point on the boundary of TZq, is obtained by 
letting one node transmit at the highest rate and varying 
the transmission rate of the other source node from to 
the maximum value C5. . This is a special case of priority 
mapping; the reduced rate for a node is equivalent to marking a 
fraction of epochs (in a full rate transmission) to be given equal 
priority. If we forget about the unmarked epochs, then the 
rate region is identical to Corollary [l] However the unmarked 
epochs owing to unused transmissions in the output schedule 
still have a chance of being relayed and the BGM algorithm 
can be used between the unmarked epochs of the input and 
unused epochs of the output. This successive application of 
BGM amounts to time-sharing between the zero priority and 
high priority strategies. Since the point on the boundary of TZq 
has a reduced rate of transmission for one node, it is strictly 
in the interior of priority achievable rate region. Therefore, 
the bounding convex polygon forms an inner bound to the 
best achievable rate region. Evaluating the tangents at the 
maximum sum-rate point of Corollary [T] yield the expressions 
in Theorem 2. □ 

Proof of Theorem \3\ 

Consider the modified point processes as defined in the 
proof of Theorem [T] X[ denotes the z*'* step size of the 
random walk between two absorbing barriers. The average 
delay incurred by the BGM algorithm is equal to the expected 
mean size of the random walk without including the steps that 
hit either boundaries. Following the exposition in example 2.16 
in ([30], Page 67), the cumulative distribution of the step size 
(or delay A^) in the interval (0, A) is given by 



Pr(X, <x) = 



l-^exp(A*+x)(C5, -Cjj 
l-Sexp(A*(C7si -Cb)) 



(21) 
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Using the expression above, the average delay A for the BGM 
algorithm with strict delay A can be evaluated as: 



A = E{x;ix; e (o,A*)} 

l + exp(A*(Cs, -Cb))[A*(Csi 



[Cb - CsJ [1 - exp(A*(C5, - Cb))] 
If Cb > Csi, then as A* oo, 

. ^ 1 + cxp(A*(Cs, - Cb))A*{Cs, - Cb) 

iCB-Cs,)[l-eMA*{Cs,-CB))] 
1 



Cb — Csi 

This implies that if A > , then the BGM algorithm 

with A* = oo would be sufficient, and more importantly, 
optimal. It is easy to see that for small values of A, the 
average delay A « In other words, when the allowed 
delay is very small, relaxing the constraint does not provide 
significant improvement. q 

Proof of Lemma |7] 

Consider the modified point processes as defined in the 
proof of Theorem[T] X[ denotes the i*'* step size of the random 
walk between two absorbing barriers. Consider a subsequence 
Xi of X[, wherein Z' contains all points in X' that are 
strictly greater than 0. In other words Xi does not represent 
any dummy packets. Accordingly the erasure variable E{i) = 
^o<x <A because a packet is relayed whenever the random 
walk does not hit either barriers. Since the point processes 
are renewal processes, the resulting random walk is stationary 
and the distribution for X[ given by (ISTT l. Therefore the erasure 
E{i) is a stationary and ergodic Markov chain and the capacity 
of the erasure channel is given by 

-Y^Eii) = l-Pr{X, -A} 



lim 

n— >oo Tl 



1 



Vv{Xi ^ A} 
(1 - Vv{X[ = 0}) 

Tb 

Tb. -A(Tsi-Tb) _ Isi. 



1 



Proof of Theorem |4] 

From (fTTI l, we know that A'^(S,B) is an achievable relay 
rate vector when nodes in B are covert. It remains to be seen 
that the condition _ff(S|S) > a guarantees an anonymity a. 
For this purpose, it is sufficient to show that 

H{S\y) < H{S\S). 

Let y be the schedules generated assuming S was a session 
and none of the nodes were covert. The transmission rates of 
nodes in y are assumed identical to y. For the nodes that 
are the sources in S, the schedules are independent in y and 
y. Session S has additional sources due to the broken paths, 
which also generate independent transmission schedules. The 



set of these additional sources is identical to the set of covert 
relays in S. Therefore, the schedules are independent in 3^ 
as well. Since the remaining nodes relay all received packets 
within negligible processing delay, p{y\S) — p{y\S). Then, 
using the data processing inequality (S — S — 



H{s\y) = H{s\y) < His\s). 



□ 



Proof of Theorem \5\ 

Consider the optimal solution q*(S|S) of the distortion rate 
problem, 

D= min E(d(S,S)). 

g(S|S):/(S:S)<(l-a)if(S) 

From the definition of d{S, S), it is easy to see that if 
s.t. S = T(S,B), then q*{S\S) = 0. Given S,S, we can 
show that the set of covert relays B are uniquely determined, 
using the following argument: 

Suppose 3Bi ^ B2 such that T(S,Bi) = T(S,B2). Then, 
we can write Bi = (B,B;),B2 = (B,B'2) where B'^ = 
(Sn,.-- ,Si„0, B^ = (B21,--- ,B2n) and B^pB^ = </). 
We know that 



S(S,Bi) = t(...i(T(S,B),Bn),- 
- t(...t(T(S,B),B2i),- 



),S2„)-S(S,B2) 



Suppose none of the paths in T(S,B) contain B'j^lJBj, 
then it does not matter if those relays are covert or not, in 
which case the subset of covert relays would be B. 

If 3P G T(S,B) that contains Bn, then T(S,Bi) would 
contain a path that ends in Bu, whereas T(S,B2) cannot 
contain such a path. Therefore, we have a contradiction. 

The above argument shows that we can equivalently write 
q*(S|S) — (j*(B|S). Therefore, q* specifies a valid selection 
strategy. Since H{S) is fixed apriori, /(S; S) < (1 - a)H{S) 
ensures that an anonymity a is guaranteed. Further, for every 
B, the function d evaluates the difference in achievable rate 
vectors A (S) and A (S, B). Taking expectation over q* (B|S), 
it is easy to see that the distortion D is achievable with 
a— anonymity. ^ 

Switching Network Example 

When all relays are visible, the eavesdropper would 
not know the final node of any route. This implies that 
given an observation, 4 possible source-destination pairings 
would be equally likely. This implies that his uncertainty 
H{S\y) = log(4). Since the priors are equally likely 
H{S) = log(24). Therefore, when all relays are visible, 

_ log(4) _ .og 
" ^ log(24) ^ 

When All , are covert, the number of 

possible pairings given an observation would 
depend on the session. For example, if 

Ml, M2, D1US2, Mi,M2, D2US3, M3, M4, Ds), 
{S4, M3, M4, D4)} is the session, then the eavesdropper 
would be able to identify that all transmissions from 
Ml are relayed by M2, and his uncertainty would 
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be log(4). This is identical to 7 other pairings 
(whenever 81,82 use the same set of relays). Suppose 
{(5i, Ml, M2, Di), {82, Ml, M4, D3),i83, M2, M3, D2), 
(54, M2, M4, £)4)} was the session, then it would be 
indistinguishable from the 15 remaining sessions (whenever 
81 , ^2 do not use the same set of relays), and his uncertainty 
would increase to log(16). Therefore, since all sessions are 
equally probable, 

H{S\y) (l/3)log(4) + (2/3)log(16) 

-hW = ^) = 
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